# Querying and downloading cartographic material from loc.gov

<p class="lead">Table of Contents</p>

- [This notebook](#This-notebook)
- [About the Library of Congress's cartographic materials](#About-the-Library-of-Congress's-cartographic-materials)
- [1. Required Prep: Import all of the Python modules we'll need](#1.-Required-Prep:-Import-all-of-the-Python-modules-we'll-need)
- [2. Practice: Get familiar with the API](#2.-Practice:-Get-familiar-with-the-API)
- [3. Practice: Understand how pagination works](#3.-Practice:-Understand-how-pagination-works)
- [4. Download files: retrieve files associated with any given query](#4.-Download-files:-retrieve-files-associated-with-any-given-query)
- [5. Examples: Advanced Querying](#5.-Advanced-Querying)


<div class="alert alert-info">

<p class="lead"> Skip ahead </p>

If you're just looking to bulk download files, check out sections 1 and 4.
</div>

<div class="alert alert-info">

<p class="lead">API Documentation</p>

Detailed documentation about the loc.gov API and syntax used to construct queries can be found at <a href="https://libraryofcongress.github.io/data-exploration/">About the loc.gov JSON AP</a>. See especially the <a href="https://libraryofcongress.github.io/data-exploration/requests.html">Requests</a> page.

</div>

<div class="alert alert-info">
<p class="lead">More Resources</p>
    
Other Jupyter notebooks and examples can be found at <a href="https://labs.loc.gov/lc-for-robots/#try">LC for Robots</a>. In particular, be sure to check out <a href="https://github.com/LibraryOfCongress/data-exploration/blob/master/loc.gov%20JSON%20API/Accessing%20images%20for%20analysis.ipynb">Accessing images for image analysis</a>, which this notebook builds on, and <a href="https://github.com/charlie-moffett/data-exploration/blob/master/loc.gov%20JSON%20API/Extracting%20location%20data%20for%20geovisualization.ipynb">Extracting location data from the loc.gov API for geovisualization</a>. 

</div>

## This notebook
Library of Congress makes a portion of its collections available online via [loc.gov](https://www.loc.gov). There are millions of items in these online collections, including maps and other cartographic items. With the help of the loc.gov API, these materials can be accessed in bulk for downloading, computational re-use, and analysis. 

This Jupyter notebook focuses on the cartographic materials on loc.gov, specifically how to download image files in bulk. It is part one in a series of Jupyter notebooks exploring how to computationally access, retrieve, and analyze cartographic collections on loc.gov. 

This noteobok demonstrates methods to:
- perform bulk downloads of cartographic materials using the loc.gov API and Python
- craft advanced API query
- perform post-query filtering


## About the Library of Congress's cartographic materials
The Library of Congress has one of the largest and most comprehensive cartographic collections in the world, numbering over 5.2 million maps, atlases, geospatial datasets, globes, globe cores, three-dimensional relief models, and other cartographic formats. There is even a chocolate map. The collection includes in-depth historical and modern coverage of the United States, as well as broad global coverage. Official topographic, geologic, soil, mineral, and resource maps and nautical and aeronautical charts are available for most countries of the world. 

The material available online via loc.gov is only a small portion of these collections, selected due to researcher demand, value, rarity, physical condition, ease of imaging, partnerships, and numerous other factors. When using the materials for computational analysis and statistical study, it is valuable to understand the context of their creation and factors that contribute to the scope and sampling of the online collection. For more information about the primary areas of coverage in the online map collection, please see the list of online collections containing maps at https://www.loc.gov/maps/collections/. 

 
<table style="width:100%">
  <tr>
    <th>
        <a href="https://www.loc.gov/resource/g4802c.ct001180/">
        <img src="http://tile.loc.gov/image-services/iiif/service:gmd:gmd4:g4802:g4802c:ct001180/full/pct:3/0/default.jpg" alt="Portolan chart of the Pacific coast from Guatemala to northern Peru with the Galapagos Islands">
        </a>
    </th>
    <th>
        <a href="https://www.loc.gov/resource/g4701g.ct009133/">
        <img src="http://tile.loc.gov/image-services/iiif/service:gmd:gmd4:g4701:g4701g:ct009133/full/pct:2.5/0/default.jpg" alt="The Codex Quetzalecatzin">
        </a>
    </th>
    <th>
        <a href="https://www.loc.gov/resource/g7824qm.gct00244/?st=gallery">
        <img src="http://tile.loc.gov/image-services/iiif/service:gmd:gmd7m:g7824m:g7824qm:gct00244:ca000004/full/pct:8/0/default.jpg" alt="Quanzhou Fu yu di tu shuo">
        </a>
    </th>
    <th>
        <a href="https://www.loc.gov/resource/g8014p.ct001366/">
        <img src="http://tile.loc.gov/image-services/iiif/service:gmd:gmd8:g8014:g8014p:ct001366/full/pct:4/0/default.jpg" alt="Plan de Pnom-Penh, 1920s">
        </a>
    </th>
    <th>
        <a href="https://www.loc.gov/resource/g3200.ct001256/">
        <img src="http://tile.loc.gov/image-services/iiif/service:gmd:gmd3:g3200:g3200:ct001256/full/pct:4/0/default.jpg" alt="Outline of post-war new world map, 1942">
        </a>
    </th>
  </tr>
  <tr>
    <th>
        <a href="https://www.loc.gov/resource/g3182m.ct003805/">
        <img src="http://tile.loc.gov/image-services/iiif/service:gmd:gmd3:g3182:g3182m:ct003805/full/pct:3/0/default.jpg" alt="Mars, 1965">
        </a>
    </th>
    <th>
        <a href="https://www.loc.gov/resource/g3984v.cw0291000/">
        <img src="http://tile.loc.gov/image-services/iiif/service:gmd:gmd398:g3984:g3984v:cw0291000/full/pct:4/0/default.jpg" alt="Vicksburg National Military Park and Vicksburg National Cemetery, 1952">
        </a>
    </th>
    <th>
        <a href="https://www.loc.gov/resource/g3201f.ct004057/">
        <img src="http://tile.loc.gov/image-services/iiif/service:gmd:gmd3:g3201:g3201f:ct004057/full/pct:3/0/default.jpg" alt="United States collective defense arrangements, 1956">
        </a>
    </th>
    <th>
        <a href="https://www.loc.gov/resource/g7625.ct002981/">
        <img src="http://tile.loc.gov/image-services/iiif/service:gmd:gmd7:g7625:g7625:ct002981/full/pct:2.8/0/default.jpg" alt="South Asia. 10-63, 1963">
        </a>
    </th>
    <th>
        <a href="https://www.loc.gov/resource/g7154y.ct003521/">
        <img src="http://tile.loc.gov/image-services/iiif/service:gmd:gmd7:g7154:g7154y:ct003521/full/pct:3/0/default.jpg" alt="Erewani hatakagitsě 1920 tʻ">
        </a>
    </th>
  </tr>
</table> 



## 1. Required Prep: Import all of the Python modules we'll need

Python comes pre-packaged with a standard set of commands that can be run out-of-the-box (Python commands are also known as "modules" or "packages"). The basic set included with Python is commonly referred to as the [Python Standard Library](https://docs.python.org/3/library/). Most scripts need additional modules, and those modules must be installed on your computer and then imported into your script, usually at the very top of the script. 

If you are using Jupyter Notebook via Anaconda, good news! All of the modules used in this script come pre-installed with Anaconda (check out the full list at https://docs.anaconda.com/anaconda/packages/pkg-docs/). 

However, if you get an error message when you run the cell below, you may need to go to a terminal window and install one or more of the modules. One of the easiest ways of installing modules is by using pip, as in this tutorial from W3C: https://www.w3schools.com/python/python_pip.asp


<div class="alert alert-success">
<p class="lead"> Run the next cell to: </p> 
    
import modules. 
    
&#8595; &#8595; &#8595; &#8595; &#8595; &#8595;

</div>

In [1]:
import requests
import time
import pandas as pd
import os
import pprint
import re

## 2. Practice: Get familiar with the loc.gov API 
You'll want to create a query that the loc.gov API can understand. There are a few ways to construct your query. The simplest is to go to www.loc.gov and run a search. After the search runs, copy the URL from the address bar of your web browser. 

For example, if we were doing a search for Sanborn Fire Insurance maps of Arkansas from the 19th century, the URL we copy might look like this:

- `https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:arkansas`

There are two sections of this URL that contain query parameters. The first, which is optional, specifies a collection filter:

1. `/collections/sanborn-maps/` 

This filters the query down to items belonging to the collection called "Sanborn Maps". The landing page for that collection is at https://www.loc.gov/collections/sanborn-maps/about-this-collection/. 

The rest of the filters will be contained in the section that begins with the question mark:

2. `?dates=1800/1899&fa=location:arkansas`

This section of the URL contains most of our query parameters. In this example, the parameters can be broken down into a few sections:

- `?` - A single question mark always preceeds the parameters.
- `dates=1800\1899` - This indicates that the "date" field value must be between 1800 and 1899.
- `&` - The ampersand is used to connect each parameter to the next.
- `fa=location:arkansas` - The facet ("fa" means "facet") called "location" should equal "arkansas".
- `&fo=json` - This parameter isn't actually included above. We can add this to the end of our search URL to tell the API to give us the results in JSON ("fo" means "format"). This will return the results as JSON data, rather than an HTML webpage. Note that this parameter has an ampersand at the start, to connect it to the previous parameter.

<div class="alert alert-info">

<p class="lead">&fo=json</p>

In your web browser address bar, you can add the format parameter `fo=json` to get the results as JSON data instead of an HTML webpage.

</div>

<div class="alert alert-info">

<p class="lead">API Documentation</p>

Detailed documenation about the loc.gov API and syntax used to construct queries can be found at <a href="https://libraryofcongress.github.io/data-exploration/">About the loc.gov JSON AP</a>. See especially the <a href="https://libraryofcongress.github.io/data-exploration/requests.html">Requests</a> page.

</div>



The cell below does four things:
1. defines our query URL, 
2. runs the query (aka, sends the query to the API), 
3. tells Python to interpret the results as JSON, and
4. tells Python to show us the first result in that JSON. 

When you run the cell below, the output you'll see is the first item in your search result, as JSON. The fields visible in the JSON will include much of the same information that you see in the HTML version of the same data: https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:arkansas.

<div class="alert alert-success">
<p class="lead"> Run the next cell to: </p> 
    
practice the API query, by searching for all Sanborn Fire Insurance atlases for Arkansas, between 1800 and 1899. 

The output will be the first item in your search results, as JSON.
    
&#8595; &#8595; &#8595; &#8595; &#8595; &#8595;
</div>

In [2]:
'''1. Define your query URL.'''
p1_search = 'https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:arkansas&fo=json'

'''2. Run the query using the API.'''
api_query = requests.get(p1_search)

'''3. Tell Python to read the results as json.'''
search_result = api_query.json()

'''4. Look at the first record in the results section.'''
first_result = search_result['results'][0]
pprint.pprint(first_result)

{'access_restricted': False,
 'aka': ['http://www.loc.gov/item/sanborn00192_001/',
         'http://hdl.loc.gov/loc.gmd/g4004am.g001921886',
         'http://www.loc.gov/resource/g4004am.g001921886/'],
 'campaigns': [],
 'date': '1886-06',
 'dates': ['1886-06-01T00:00:00Z', '1886-01-01T00:00:00Z'],
 'description': ['Jun 1886.   2.'],
 'digitized': True,
 'extract_timestamp': '2020-05-01T18:38:47.796Z',
 'group': ['sanborn'],
 'hassegments': True,
 'id': 'http://www.loc.gov/item/sanborn00192_001/',
 'image_url': ['https://tile.loc.gov/storage-services/service/gmd/gmd400m/g4004m/g4004am/g001921886/00192_1886-0001.gif',
               'https://tile.loc.gov/storage-services/service/gmd/gmd400m/g4004m/g4004am/g001921886/00192_1886-0001.gif#h=150&w=126',
               'https://tile.loc.gov/image-services/iiif/service:gmd:gmd400m:g4004m:g4004am:g001921886:00192_1886-0001/full/pct:12.5/0/default.jpg#h=956&w=806',
               'https://tile.loc.gov/image-services/iiif/service:gmd:gmd400m:g40

### How to read the JSON above

If you look closely at the JSON result above, you can find all the same things that you would see on the loc.gov webpages. Look for the 'title' field and 'date' fields. Together, these will tell you the city, county, state, and date of the Sanborn atlas. Look for the 'url' field to get the link to the item in loc.gov. If you follow that link, you'll see this same JSON record translated into HTML. 

## 3. Practice: Understand how pagination works
Try clicking on this link:

https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:arkansas

Notice that you get several pages of results. The same is true when you use the API to get the results as JSON; the API gives you one page at a time. By default, it gives you the first page. You must ask for page 2, 3, etc. 

When we ran this API query in the cell above, we got Page 1 of the results, as JSON (we only printed out the first result on that first page). As it happens, our search has 5 pages of results (25 items/page) and 104 total results. To get the rest, we'll have to ask the API for each of the other pages, 2 through 5. 

The API makes this easy to do!

The JSON results `search_result` contains a section called "pagination", which describes the pages of our search results. This section tells us:
- which page we are looking at now, 
- what the next page is, 
- how many total pages there are,
- and more!


Run the cell below to print the "pagination" section of the results. Earlier, we saved our results to the variable "search_results", so we can call the pagination section with `search_results['pagination']`.

<div class="alert alert-success">
<p class="lead"> Run the next cell to: </p> 
look at the "pagination" section from our practice API query. 
    
&#8595; &#8595; &#8595; &#8595; &#8595; &#8595;
</div>

In [3]:
search_result['pagination']

{'from': 1,
 'results': '1 - 25',
 'last': 'https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:arkansas&fo=json&sp=4',
 'total': 5,
 'previous': None,
 'perpage': 25,
 'perpage_options': [25, 50, 100, 150],
 'of': 104,
 'next': 'https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:arkansas&fo=json&sp=2',
 'current': 1,
 'to': 25,
 'page_list': [{'url': None, 'number': 1},
  {'url': 'https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:arkansas&fo=json&sp=2',
   'number': 2},
  {'url': 'https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:arkansas&fo=json&sp=3',
   'number': 3},
  {'url': 'https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:arkansas&fo=json&sp=4',
   'number': '...'}],
 'first': None}

Notice that the pagination section of our results has these fields: 

- The current page: `search_result['pagination']['current']` 
- URL to the next page: `search_result['pagination']['next']` 
- Total number of results: `search_result['pagination']['of']` 
- Total number of results per page: `search_result['pagination']['perpage']` 
- Total number of pages: `search_result['pagination']['total']` 

<div class="alert alert-success">
<p class="lead"> Run the next cell to: </p> 
    
view specific parts of the "pagination" section.  
    
&#8595; &#8595; &#8595; &#8595; &#8595; &#8595;

</div>

In [4]:
print('Current page:')
print(search_result['pagination']['current'])

print('\nPath to request the next page:')
print(search_result['pagination']['next'])

print('\nTotal number of results:')
print(search_result['pagination']['of'])

print('\nTotal number of results per page:')
print(search_result['pagination']['perpage'] )

print('\nTotal number of pages:')
print(search_result['pagination']['total'])

Current page:
1

Path to request the next page:
https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:arkansas&fo=json&sp=2

Total number of results:
104

Total number of results per page:
25

Total number of pages:
5


## 4. Download files: retrieve files associated with any given query
<p class="lead">(1 of 2) First, we will create three functions.</p>

A "function" is a set of directions. When we create a function, it doesn't do anything yet, it just defines what will happen when we run the function at a later time. 

Functions usually want you to supply some information ("arguments"). Then the function runs a series of steps on that information and gives back new information to you.

We'll define a function for each of three tasks:
1. **get_item_ids** - Run a loc.gov API search and get a list of results (paginating through all pages). 
2. **get_image_urls** - Get a list of image URLs from those results. 
3. **get_image_files** - Download all the image URLs. 

The code here borrows partially from [Accessing images for image analysis](https://github.com/LibraryOfCongress/data-exploration/blob/master/loc.gov%20JSON%20API/Accessing%20images%20for%20analysis.ipynb), which is another great resource.

The functions in this notebook are large and do many things. In the real world, best practice for Python functions is usually to break them up into multiple small functions, where each function does one thing.

<div class="alert alert-success">
<p class="lead"> Run the next cell to: </p> 
    
define functions (aka, create instructions) for downloading files.
    
This cell won't download any files.  
    
&#8595; &#8595; &#8595; &#8595; &#8595; &#8595;

</div>

In [5]:
'''Run P1 search and get a list of results.'''
def get_item_ids(url, items=[], conditional='True'):
    # Check that the query URL is not an item or resource link.
    exclude = ["loc.gov/item","loc.gov/resource"]
    if any(string in url for string in exclude):
        raise NameError('Your URL points directly to an item or '
                        'resource page (you can tell because "item" '
                        'or "resource" is in the URL). Please use '
                        'a search URL instead. For example, instead ' 
                        'of \"https://www.loc.gov/item/2009581123/\", '
                        'try \"https://www.loc.gov/maps/?q=2009581123\". ') 
    
    # request pages of 100 results at a time
    params = {"fo": "json", "c": 100, "at": "results,pagination"}
    call = requests.get(url, params=params)
    # Check that the API request was successful
    if (call.status_code==200) & ('json' in call.headers.get('content-type')):
        data = call.json()
        results = data['results']
        for result in results:
            # Filter out anything that's a colletion or web page
            filter_out = ("collection" in result.get("original_format")) \
                    or ("web page" in result.get("original_format")) \
                    or (eval(conditional)==False)
            if not filter_out:
                # Get the link to the item record
                if result.get("id"):
                    item = result.get("id")
                    # Filter out links to Catalog or other platforms
                    if item.startswith("http://www.loc.gov/item"):
                        items.append(item)
        # Repeat the loop on the next page, unless we're on the last page. 
        if data["pagination"]["next"] is not None: 
            next_url = data["pagination"]["next"]
            get_item_ids(next_url, items, conditional)

        return items
    else:
            print('There was a problem. Try running the cell again, or check your searchURL.')


'''Get a list of image URLs from those results 
If an item has 2+ copies/pages, all copies/pages 
are included. User selects file format (e.g., tiff).'''
def get_image_urls(id_list, mimetype, items = []):
    print('Generating a list of files to download . . . ')
    #Standardize any spelling varieties supplied by user.
    if mimetype == 'tif':
        mimetype = 'tiff'
    if mimetype == 'jpg':
        mimetype = 'jpeg'
    params = {"fo": "json"}
    for item in id_list:    
        call = requests.get(item, params=params)
        if call.status_code == 200:
            data = call.json()      
        elif call.status_code == 429:
            print('Too many requests to API. Stopping early.')
            break
        else:
            try:
                time.sleep(15)
                call = requests.get(item, params=params)
                data = call.json()
            except:
                print('Skipping: '+ item)
                continue
        resources = data['resources']
        for resource_index,resource in enumerate(resources):
            resource_url = data['item']['resources'][resource_index]['url']
            for index,file in enumerate(resource['files']):
                image_df = pd.DataFrame(file)
                
                if mimetype == 'pdf':
                    full_mimetype = 'application/' + mimetype
                else:
                    full_mimetype = 'image/' + mimetype
                selected_format_df = image_df[
                    image_df['mimetype']==full_mimetype
                ]
                try:
                    last_selected_format = selected_format_df.iloc[-1]['url']
                    file_info = {}
                    file_info['image_url'] = last_selected_format
                    file_info['item_id'] = item
                    items.append(file_info)
                except:
                    print('Note: No ' + mimetype + 
                          ' files found in '+ 
                          resource_url + '?sp=' + str(index+1))
        #Pause between requests
        time.sleep(2) 
    print('\nFound '+str(len(id_list))+' items')
    print('Found '+str(len(items))+' files to download')
    return items


'''Download all the image URLs'''
def get_image_files(image_urls_list, path):
    image_urls_df = pd.DataFrame(image_urls_list)
    for index, row in image_urls_df.iterrows():
        image_url = row['image_url']
        item_id = row['item_id']
        print('Downloading: '+ image_url)
        try:
            #filename = create a filename based on the last part of the URL.
            #directory = create a folder based on the item ID.
            id_prefix = item_id.split('/')[-2] 
            directory = path + id_prefix + '/'
            if os.path.isdir(directory)==False:
                os.makedirs(directory) 
            #IIIf URLs (jpegs) need to be parsed in a special way
            if 'image-services/iiif' in image_url:
                #split the url by "/"
                url_parts = image_url.split('/')
                #find the section that begins "service:"
                regex = re.compile("service:.*")
                pointer = list(filter(regex.match, url_parts))
                #split that section by ":".The last part will be the filename.
                filename = pointer[0].split(':')[-1]
                #get the file extension
                ext = image_url.split('.')[-1]
                filename = filename + '.' + ext
            #non-IIIF URLs are simpler 
            else:
                filename = image_url.split('/')[-1] 
            filepath = os.path.join(directory, filename) 
            print('Saving as: ' + filepath)
            #request the image and write to path
            image_response = requests.get(image_url, stream=True)
            with open(filepath, 'wb') as fd:
                for chunk in image_response.iter_content(chunk_size=100000):
                    fd.write(chunk)
        except ConnectionError as e:
            print(e)
        #Pause between downloads
        time.sleep(6)
        

<p class="lead">(2 of 2) Second, let's run each of the three functions</p>

The cell below runs each of our three functions, which will download the files associated with our search.

First, though, we need to feed the functions three pieces of information. Enter `searchURL` and `fileExtension` at the top of the cell below:

1. **searchURL** - Your search URL, as described above. You can include "&fo=json" on the end or not. The function will add it if you've left it off.
2. **fileExtension** - The file format you'd like. Replace "jp2" with the another extension (e.g., tif, pdf, gif, or jpg). If you put a file format that the API can't find, our function will let you know that it couldn't find any files of that format.

You'll be asked to enter `saveTo` in another cell:
3. **saveTo** - The folder on your local computer where you'd like to save the files. 

<div class="alert alert-success">
<p class="lead"> Run the next cell to: </p> 
    
<strong>generate lists</strong> of files to download to your computer, for all GIFs of Sanborn maps from the 19th century for towns called "Springfield". No files will be downloaded. 
    
You can change the `searchURL` to another search, or change gif to another image file format (e.g., tiff or jpeg). At the bottom of this notebook are examples of how to construct other searches.
    
    
&#8595; &#8595; &#8595; &#8595; &#8595; &#8595;
</div>

In [6]:
searchURL = 'https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:springfield'
fileExtension = 'gif'

# 1. get_item_ids
ids = get_item_ids(searchURL, items=[])

# 2. get_image_urls
image_urls_list = get_image_urls(ids, fileExtension, items=[])

print('\nList of files to be downloaded:')
for url in image_urls_list:
    print(url['image_url'])

Generating a list of files to download . . . 
Note: No gif files found in https://www.loc.gov/resource/g4104sm.g021631896/?sp=74
Note: No gif files found in https://www.loc.gov/resource/g4104sm.g021631896/?sp=75
Note: No gif files found in https://www.loc.gov/resource/g4104sm.g021631896/?sp=76
Note: No gif files found in https://www.loc.gov/resource/g4104sm.g021631896/?sp=77
Note: No gif files found in https://www.loc.gov/resource/g4104sm.g021631896/?sp=78
Note: No gif files found in https://www.loc.gov/resource/g4104sm.g021631896/?sp=79
Note: No gif files found in https://www.loc.gov/resource/g4104sm.g021631896/?sp=80
Note: No gif files found in https://www.loc.gov/resource/g4104sm.g021631896/?sp=81
Note: No gif files found in https://www.loc.gov/resource/g4084sm.g4084sm_g069001894/?sp=58
Note: No gif files found in https://www.loc.gov/resource/g4084sm.g4084sm_g069001894/?sp=59
Note: No gif files found in https://www.loc.gov/resource/g4084sm.g4084sm_g069001894/?sp=60
Note: No gif file

Notice that a few atlas pages don't have GIF files. Those items are listed at the top. 
Listed below are all the files that will be downloaded, and a total count of files to be downloaded.

<div class="alert alert-success">
<p class="lead"> Run the next cell to: </p> 
    
download files. They will be saved to the folder in the `saveTo` variable below. The cell output above gives a preview of what's going to be downloaded below.
    
&#8595; &#8595; &#8595; &#8595; &#8595; &#8595;
</div>

<div class="alert alert-danger">
<p class="lead">Before you run</p> 

Where would you like to save the files? Edit the `saveTo` varible below. It should point to a folder on your computer, into which you'd like to download the files. 

</div>

In [7]:
saveTo = 'C:/Users/rtrent/jupyter-notebooks/sanborn-images/'

# 3. get_image_files
get_image_files(image_urls_list,saveTo)

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631884/02163_1884-0001.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_001/02163_1884-0001.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631884/02163_1884-0002.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_001/02163_1884-0002.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631884/02163_1884-0003.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_001/02163_1884-0003.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631884/02163_1884-0004.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_001/02163_1884-0004.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631884/02163_1884-0005.gif
Saving as: C:/Users/rtrent/

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631890/02163_1890-0024.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_002/02163_1890-0024.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631890/02163_1890-0025.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_002/02163_1890-0025.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631896/02163_1896-0000.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_003/02163_1896-0000.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631896/02163_1896-0001.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_003/02163_1896-0001.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631896/02163_1896-0002.gif
Saving as: C:/Users/rtrent/

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631896/02163_1896-0037.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_003/02163_1896-0037.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631896/02163_1896-0038.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_003/02163_1896-0038.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631896/02163_1896-0039.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_003/02163_1896-0039.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631896/02163_1896-0040.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn02163_003/02163_1896-0040.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631896/02163_1896-0041.gif
Saving as: C:/Users/rtrent/

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd395m/g3954m/g3954sm/g032451893/03245_1893-0002.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03245_002/03245_1893-0002.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd395m/g3954m/g3954sm/g032451898/03245_1898-0001.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03245_003/03245_1898-0001.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd395m/g3954m/g3954sm/g032451898/03245_1898-0002.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03245_003/03245_1898-0002.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd395m/g3954m/g3954sm/g032451898/03245_1898-0003.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03245_003/03245_1898-0003.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581886/03858_1886-0001.gif
Saving as: C:/Users/rtrent/

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0000.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0000.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0001.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0001.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0002.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0002.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0003.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0003.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0004.gif
Saving as: C:/Users/rtrent/

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0033.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0033.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0034.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0034.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0035.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0035.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0036.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0036.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0037.gif
Saving as: C:/Users/rtrent/

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0072.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0072.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0073.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0073.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0074.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0074.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0075.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn03858_002/03858_1896-0075.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd376m/g3764m/g3764sm/g038581896/03858_1896-0076.gif
Saving as: C:/Users/rtrent/

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd416m/g4164m/g4164sm/g04881002/sb000110.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn04881_002/sb000110.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd416m/g4164m/g4164sm/g04881002/sb000120.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn04881_002/sb000120.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd416m/g4164m/g4164sm/g04881002/sb000130.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn04881_002/sb000130.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd416m/g4164m/g4164sm/g04881002/sb000140.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn04881_002/sb000140.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd416m/g4164m/g4164sm/g4164sm_g048811891/04881_1891-0001.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn04881_003/04

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd416m/g4164m/g4164sm/g4164sm_g048811896/04881_1896-0012.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn04881_004/04881_1896-0012.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd416m/g4164m/g4164sm/g4164sm_g048811896/04881_1896-0013.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn04881_004/04881_1896-0013.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd416m/g4164m/g4164sm/g4164sm_g048811896/04881_1896-0014.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn04881_004/04881_1896-0014.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd416m/g4164m/g4164sm/g4164sm_g048811896/04881_1896-0015.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn04881_004/04881_1896-0015.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd416m/g4164m/g4164sm/g4164sm_g048811896/04881_1

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001886/06900_1886-0023.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_001/06900_1886-0023.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001886/06900_1886-0024.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_001/06900_1886-0024.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001886/06900_1886-0025.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_001/06900_1886-0025.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001886/06900_1886-0026.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_001/06900_1886-0026.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001891/06900_1

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001891/06900_1891-0033.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_002/06900_1891-0033.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001891/06900_1891-0034.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_002/06900_1891-0034.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001891/06900_1891-0035.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_002/06900_1891-0035.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001891/06900_1891-0036.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_002/06900_1891-0036.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001891/06900_1

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001894/06900_1894-0031.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_003/06900_1894-0031.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001894/06900_1894-0032.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_003/06900_1894-0032.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001894/06900_1894-0033.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_003/06900_1894-0033.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001894/06900_1894-0034.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn06900_003/06900_1894-0034.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd408m/g4084m/g4084sm/g4084sm_g069001894/06900_1

Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd396m/g3964m/g3964sm/g3964sm_g083801898/08380_1898-0005.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn08380_003/08380_1898-0005.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd375m/g3754m/g3754sm/g089501885/08950_1885-0001.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn08950_001/08950_1885-0001.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd375m/g3754m/g3754sm/g089501885/08950_1885-0002.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn08950_001/08950_1885-0002.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd375m/g3754m/g3754sm/g089501894/08950_1894-0001.gif
Saving as: C:/Users/rtrent/jupyter-notebooks/sanborn-images/sanborn08950_002/08950_1894-0001.gif
Downloading: https://tile.loc.gov/storage-services/service/gmd/gmd375m/g3754m/g3754sm/g089501894/08950_1894-0002.gif
Saving as: C:/Users

<p class="lead">Congratulations, you've just downloaded your files!</p> 

Go take a look at them on your local computer. Each atlas will have its own folder, with each page's gif inside that folder. You should be seeing files like these:

<table style="width:100%">
  <tr>
    <th>
        <img src="https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631884/02163_1884-0001.gif" alt="02163_1884-0001.gif">
    </th>
    <th>
        <img src="https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631890/02163_1890-0000.gif" alt="02163_1890-0000.gif">
    </th>
    <th>
        <img src="https://tile.loc.gov/storage-services/service/gmd/gmd410m/g4104m/g4104sm/g021631896/02163_1896-0000.gif" alt="02163_1896-0000.gif">
    </th>
    <th>
        <img src="https://tile.loc.gov/storage-services/service/gmd/gmd395m/g3954m/g3954sm/g032451893/03245_1893-0001.gif" alt="03245_1893-0001.gif">
    </th>
    <th>
        <img src="https://tile.loc.gov/storage-services/service/gmd/gmd395m/g3954m/g3954sm/g032451898/03245_1898-0001.gif" alt="03245_1898-0001.gif">
    </th>
  </tr>
</table> 

## 5. Examples: Advanced Querying
<div class="alert alert-info">

<p class="lead">API Documentation</p>

Detailed documentation about the loc.gov API and syntax used to construct queries can be found at <a href="https://libraryofcongress.github.io/data-exploration/">About the loc.gov JSON AP</a>. See especially the <a href="https://libraryofcongress.github.io/data-exploration/requests.html">Requests</a> page.

</div>

The files you've just downloaded are for all Sanborn Fire Insurance maps from towns called "Springfield" published in the 19th century.

Here are examples of other searches you can construct for Sanborn and other maps at loc.gov. If you'd like to download files for any of these searches, simply replace the `searchURL` variable above, and re-run the two cells above. 

## 19th century Sanborn maps of Springfield (any state)
1. To insert as `searchURL` above: https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:springfield
2. Query returned as HTML: https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:springfield
3. Query returned as JSON: https://www.loc.gov/collections/sanborn-maps/?dates=1800/1899&fa=location:springfield&fo=json
4. (**Advanced**) Same JSON query, expressed a different way:

    ``` 
    url = 'https://www.loc.gov/collections/sanborn-maps/'
    params = {
        'dates':'1800/1899',
        'fa':'location:springfield',
        'fo':'json'
    }
    requests.get(url, params = params)
    ```


## Sanborn maps of Springfield, Illinois (any date)
Cities (municipalitise), counties, and statess are all listed in the `location` field in the Sanborn collection. The metadata does not distinguish between the three types of locations. To search by two more more, simply add both to the `location` parameter.

1. To insert as `searchURL` above: https://www.loc.gov/collections/sanborn-maps/?fa=location:illinois|location:springfield
2. Query returned as HTML: https://www.loc.gov/collections/sanborn-maps/?fa=location:illinois|location:springfield
3. Query returned as JSON: https://www.loc.gov/collections/sanborn-maps/?fa=location:illinois|location:springfield&fo=json
4. (**Advanced**) Same JSON query, expressed a different way:

    ```
    url = 'https://www.loc.gov/collections/sanborn-maps/'
    params = {
        'fa':'location:illinois|location:springfield',
        'fo':'json'
    }
    requests.get(url, params = params)
    ```



## Sanborn maps of New York City, New York
Like many cities, the borders and composition of New York City are not static. For example, Brooklyn (now a borough of New York City and coterminous with Kings County,) was an independent city from the early 19th century until 1898. The Bronx was annexed into New York City in pieces. Other cities may also have similar histories of mergers, divisions, expanding and contracting borders, changing names and designations. 

Sanborn atlases are described according to the location at the time of their publication, as printed on the atlas. 

Atlases covering modern-day New York City can be found by their borough rather than by "New York City". (The boroughs may have been independent cities or counties at the time of the atlas's original publication). 

The API does not support "OR" connections. In order to obtain all maps of New York City, you will need to perform separate searches for each of the unique names of its boroughs/counties over time (entered here as raw query strings, rather than place names):
1. Bronx https://www.loc.gov/collections/sanborn-maps/?fa=location:new+york&q=bronx
2. Brooklyn https://www.loc.gov/collections/sanborn-maps/?fa=location:new+york&q=brooklyn
3. Manhatten https://www.loc.gov/collections/sanborn-maps/?fa=location:new+york&q=manhattan
4. Queens https://www.loc.gov/collections/sanborn-maps/?fa=location:new+york&q=queens
5. Staten Island https://www.loc.gov/collections/sanborn-maps/?fa=location:new+york&q=staten+island

These five searches can be deduped, using the unique identifier found in the `id` field.

Alternatively, a curated list of all NYC, NY maps can also be found at https://www.loc.gov/rr/geogmap/sanborn/city.php?CITY=New%20York&stateID=39. 


## Sanborn maps in the physical collections, but not necessarliy online
Generally, any map that has been cataloged--even if it does not have a copy available online--has a brief descriptive record retrievable via the loc.gov API. Although individual Sanborns have not been cataloged, each atlas also has a record on loc.gov, even if a copy of the atlas isn't online. To access all records, not only those online, append `all=true` to your query. For example, to retrieve records for all of the libraries' California Sanborn atlases, try:

```
https://www.loc.gov/collections/sanborn-maps/?fa=location:california&all=true
```

## Maps by Civil War cartographer Jedediah Hotchkiss
1. To insert as `searchURL` above: https://www.loc.gov/maps/?fa=contributor:hotchkiss,+jedediah
2. Query returned as HTML: https://www.loc.gov/maps/?fa=contributor:hotchkiss,+jedediah
3. Query returned as JSON: https://www.loc.gov/maps/?fa=contributor:hotchkiss,+jedediah&fo=json
4. (**Advanced**) Same JSON query, expressed a different way:

    ```   
    url = 'https://www.loc.gov/maps/'
    params = {
        'fa':'contributor:hotchkiss,+jedediah',
        'fo':'json'
    }
    requests.get(url, params = params)
    ```
 

## Panoramic maps
Panoramic maps have been organized into a collection, [Panoramic Maps](https://www.loc.gov/collections/panoramic-maps/about-this-collection/). Panoramic "maps" include drawings but not photographs. 
1. To insert as `searchURL` above: https://www.loc.gov/collections/panoramic-maps/
2. Query returned as HTML: https://www.loc.gov/collections/panoramic-maps/
3. Query returned as JSON: https://www.loc.gov/collections/panoramic-maps/&fo=json
4. (**Advanced**) Same JSON query, expressed a different way:

    ``` 
    url = 'https://www.loc.gov/collections/panoramic-maps/'
    params = {
        'fo':'json'
    }
    requests.get(url, params = params)
    ```


## Advanced: Post-query filtering 
The metadata in loc.gov items has information that can't be specifically queried by the API. For example, the Sanborn records contain page counts, recorded as number of resource files. The API doesn't specifically query this field. 

If you wanted to filter by fields that the API can't specifically query, you can run an API query for the broader population of records of interest, and then filter the results. Be careful, though, that the size of your initial query is manageable.

Our `get_item_ids` function accepts an additional parameter to help with this filtering. It takes a true/false filtering conditional, which we can pass as a string using the variable `conditional`. Any conditional we pass the function, we'll only get results for which that conditional is true.

Here's an example in practice. Let's say you wanted to get a list of Sanborn atlases from New Jersey that are at least 100 pages in length. Our conditional needs to check that the number of resource pages is larger than 99. Because of the way that we've written the `get_item_ids` function, the conditional should start with `result`:

```
result['resources'][0]['files']>99
```

However, in our code we'll need to reformulate this slightly, to escape the internal quote marks:

```
conditional = 'result[\'resources\'][0][\'files\']>99'
```


<div class="alert alert-success">
<p class="lead"> Run the next cell to: </p> 
    
view a list of New Jersey Sanborn atlase ids, with at least 100 pages.

After you've run it, try changing the conditional and re-running so that you only get 1-page results.
    
&#8595; &#8595; &#8595; &#8595; &#8595; &#8595;
</div>


In [8]:
searchURL = 'https://www.loc.gov/collections/sanborn-maps/?fa=location:new+jersey'
conditional = 'result[\'resources\'][0][\'files\']>99'
ids = get_item_ids(searchURL, items=[], conditional=conditional)
for item_id in ids:
    print(item_id)

http://www.loc.gov/item/sanborn05408_004/
http://www.loc.gov/item/sanborn05408_005/
http://www.loc.gov/item/sanborn05408_006/
http://www.loc.gov/item/sanborn05464_001/
http://www.loc.gov/item/sanborn05464_002/
http://www.loc.gov/item/sanborn05469_002/
http://www.loc.gov/item/sanborn05469_003/
http://www.loc.gov/item/sanborn05469_005/
http://www.loc.gov/item/sanborn05469_008/
http://www.loc.gov/item/sanborn05502_001/
http://www.loc.gov/item/sanborn05511_010/
http://www.loc.gov/item/sanborn05568_008/
http://www.loc.gov/item/sanborn05568_011/
http://www.loc.gov/item/sanborn05571_006/
http://www.loc.gov/item/sanborn05571_007/
http://www.loc.gov/item/sanborn05571_008/
http://www.loc.gov/item/sanborn05571_009.2/
http://www.loc.gov/item/sanborn05571_009.4/
http://www.loc.gov/item/sanborn05571_009.6/
http://www.loc.gov/item/sanborn05571_009/
http://www.loc.gov/item/sanborn05571_010/
http://www.loc.gov/item/sanborn05583_002/
http://www.loc.gov/item/sanborn05583_003/
http://www.loc.gov/item/sanb

# Next
In the next Jupyter notebook in this series, we'll:
- retrieve metadata associated with your downloaded files
- retrieve metadata associated with the Sanborns collection
- analyze and visualize metadata

Proceed to [Analyzing and visualizing cartographic metadata from loc.gov](maps-analyzing-metadata.ipynb) &#8594;